Information Causality as a Physical Principle 



Marcin Pawiowski^, Tomasz Paterek^, Dagomir Kaszlikowski^ , 

Valerio Scarani^, Andreas Winter^'^ and Marek Zukowski^ 
1 ) Institute of Theoretical Physics and Astrophysics, University of Gdansk, 
80-952 Gdansk, Poland 2) Gentre for Quantum Technologies and Department of Physics, 
National University of Singapore, 3 Science Drive 2, 
117543 Singapore, Singapore 3) Department of Mathematics, 
University of Bristol, Bristol BS8 ITW, United Kingdom 

Quantum physics exhibits remarkable distinguishing characteristics. For example, it gives only 
probabilistic predictions (non-determinism) and does not allow copying of unknown states (no- 
cloning[T]). Quantum correlations may be stronger than any classical ones[2], nevertheless informa- 
tion cannot be transmitted faster than light (no-signaling). However, all these features do not single 
out quantum physics. A broad class of theories exist which share such traits with quantum mechan- 
ics, while they allow even stronger than quantum correlations 3 . Here, we introduce the principle of 
Information Causality. It states that information that Bob can gain about a previously completely 
unknown to him data set of Alice, by using all his local resources (which may be correlated with 
her resources) and a classical communication from her, is bounded by the information volume of the 
communication. In other words, if Alice communicates m bits to Bob, the total information access 
that Bob gains to her data is not greater than m. For m = 0, Information Causality reduces to 
the standard no-signaling principle. We show that this new principle is respected both in classical 
and quantum physics, whereas it is violated by all the no-signaling correlations which are stronger 
that the strongest quantum correlations. Maximally strong no-signalling correlations would allow 
Bob access to any m bit subset of the whole data set held by Alice. If only one bit is sent by 
Alice (m = 1), this is tantamount to Bob being able to access the value of any single bit of Alice's 
data (but of course not all of them). We suggest that Information Causality, a generalization of 
no-signaling, might be one of the foundational properties of Nature. 



Classical (as opposed to quantum) physics rests on the 
assumption that all physical quantities have well defined 
values simultaneously. Relativity is based on clear-cut 
physical statements: the speed of light and the electric 
charge are the same for all observers. In contradistinc- 
tion, the definition of quantum physics is still rather a 
description of its formalism: the theory in which sys- 
tems are described by Hilbert spaces and dynamics is 
reversible. This situation is all the more unexpected as 
quantum physics is the most successful physical theory 
and quite a lot is known about it. Some of its counter- 
intuitive features are almost a popular knowledge: all 
scientists, and many laymen as well, know that quan- 
tum physics predicts only probabilities, that some phys- 
ical quantities (such as position and momentum) cannot 
be simultaneously well defined and that the act of mea- 
surement generically modifies the state of the system. 
Entanglement and no-cloning are rapidly claiming their 
place in the list of well-known quantum features; coming 
next in the queue are the feats of quantum information 
such as the possibility of secure cryptography [HE] or the 
teleportation of unknown states[6j. 

These features are so striking, that one could hope 
some of them provide the physical ground behind the 
formalism. Is quantum physics, for instance, the most 
general theory that allows violations of Bell inequali- 
ties, while satisfying no-signaling? The question was 
first asked by Popescu and Rohrlich[3] and the answer 
was found to be negative: impossibility of being repre- 



sented in terms of local variables is a property shared 
by a broad class of no-signaling theories. Such the- 
ories predict intrinsic randomness, no-cloning[71 [S], an 
information-disturbance trade-ofF|H] and allow for secure 
cryptography [IOH12]. As for teleportation and entangle- 
ment swapping [13], after a first negative attempt [M], it 
seems that they can actually be defined as well within 
the general no-signaling framework |151 I16j . In sum- 
mary, most of the features that have been highlighted 
as "typically quantum" are actually shared by all pos- 
sible no-signaling theories. Only a few discrepancies 
have been noticed: some no-signaling theories would 
lead to an implausible simplification of distributed com- 
putational tasks |17lEn] and would exhibit very limited 
dynamics [2T]. This state of affairs highlights the im- 
portance of the no-signaling principle but leaves us still 
rather in the dark about the specificity of quantum the- 
ory. 

In the present paper we define and study a previously 
unnoticed feature, which we call Information Causality. 
Information Causality generalizes no-signaling and is re- 
spected by both classical and quantum physics. How- 
ever, as we shall show, it is violated by all no-signaling 
theories that are endowed with correlations which are 
stronger than the strongest quantum correlations. It can 
therefore be used as a principle to distinguish physical 
theories from nonphysical ones and is a good candidate 
to be one of the foundational assumptions which are at 
the very root of quantum theory. 
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Formulated as a principle, Information Causality states 
that information gain that Bob can reach about previously 
unknown to him data set of Alice, by using all his local 
resources and m classical bits communicated by Alice, is 
at most m bits. The standard no-signaling condition is 
just Information Causality for m — 0. It is important to 
keep in mind that the principle assumes classical commu- 
nication: if quantum bits were allowed to be transmitted 
the information gain could be higher as demonstrated in 
the quantum super-dense coding protocol [22j. The effi- 
ciency of this protocol is based on the use of quantum 
entanglement and Information Causality holds true even 
if the quantum bits are transmitted provided they are 
disentangled from the systems of the receiver. This fol- 
lows from the Holevo bound, which limits information 
gain after transmission of m such qubits to m classical 
bits. 

We demonstrate that in a world in which certain tasks 
are "too simple" (compare with Refs. |17l I18j ). and there 
exists implausible accessibility of remote data, Informa- 
tion Causality is violated. Consider a generic situation in 
which Alice has a database of N bits described by a string 
a. She would like to grant Bob access to as big portion of 
the database as possible within fixed amount of classical 
communication. If there were no pre-established corre- 
lations between them, communication of m bits would 
open access to at most m bits of the database. With 
pre-shared correlations they could expect to do better 
(but, as we shall show, in the real world they would be 
mistaken). For concreteness, consider a generic task il- 
lustrated in Figure 1. It is a distributed version of ran- 
dom access coding[531 oblivious transfer [TH [51] and 
related communication complexity problems |26|. Alice 
receives a string of N random and independent bits, 
a = (oo, oi, oat-i). Bob receives a random value of 
b = 0, N —1, and is asked to give a value of the 6th bit 
of Alice after receiving from her a message of m classi- 
cal bits. The restrictions are only on the communication 
that can take place after the inputs have been provided. 
The resources that Alice and Bob may have shared in 
advance are assumed to be no-signaling because allow- 
ing signaling resources would open other communication 
channels. In a classical world, these additional resources 
would be correlated lists of bits; in a quantum world, Al- 
ice and Bob may share an arbitrary quantum state. But 
the task itself is open to accommodate any hypothetical 
resource producing no-signaling correlations, even such 
that go beyond the possibilities of quantum physics. We 
shall call these imaginary resources no-signaling boxes, in 
short NS-boxes. The impact of stronger-than-quantum 
correlations on the efficiency of random access coding 
has been studied recently from a different angle [24]. 

Clearly, there exists a protocol which allows Bob to 
give the correct value of at least m bits. If Alice sends 
him an m-bit message x = (aq, ...,am-i) Bob will guess 
Ob perfectly whenever b e {0, to— 1}. The price to pay 




FIG. 1: The task. Alice receives A'^ random and indepen- 
dent bits a = (ao, ai, ajv-i). In a separate location. Bob 
receives a random variable b £ {0,1,...,A^ — 1}. Alice sends 
m classical bits to Bob with the help of which Bob is asked 
to guess the value of the 6th bit in the Alice's list, at- Alice 
and Bob can share any no-signaling resources. Information 
Causality limits the efficiency of solutions to this task. It 
bounds the mutual information between Alice's data and all 
that Bob has at hand after receiving the message. 

is that he is bound to make a sheer random guess for b £ 
{to, N ~ 1}. Since the pre-shared correlations contain 
no information about a, for every strategy there will be 
tradeoff between the probabilities to guess different bits 
of o. Let us denote Bob's output by /3. The efficiency of 
Alice's and Bob's strategy can be quantified by 

N 

where I{aK ■ f3\b = K) is the Shannon mutual informa- 
tion between ok and /?, computed under the condition 
that Bob has received b = K\\. One can also show that 

N 

I>N-Y,h{PK), (2) 

where h{x) = —x log2 a:: — (1 — a;) log2(l — x) is the binary 
entropy of x, and Pk is the probability that = /?, 
again for the case oi b = K. To get the inequality, the 
aK have been assumed to be unbiased and independently 
distributed (see details in the Supplementary Informa- 
tion). 

Ideally, we would like to define that Information 
Causality holds if after transferring the TO-bit message, 
the mutual information between Alice's data a and ev- 
erything that Bob has, i.e. the message x and his part 
B of the pre-shared correlation, is bounded by to. As 
intuitively appealing such a definition is, it has the se- 
vere issue that it is not theory-independent. Specifically, 
a mutual information expression "J(a : x, By^ has to be 
defined for a state involving objects from the underlying 
nonlocal theory (the possibilities include classical corre- 
lation, a shared quantum state, NS-boxes, etc.). It is far 
from clear whether mutual information can be defined 
consistently for all nonlocal correlations, nor whether 
such a definition would be unique. 
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Instead, we shall show that if a mutual informa- 
tion can be defined that obeys three elementary prop- 
erties, then (a) Information Causality holds and (b) 
/(a : x,B) > I. Thus we obtain the following necessary 
condition for Information Causality: 

/ < m. (3) 

We stress that the parameter I is independent of any 
underlying physical theory: / does not involve any details 
of a particular physical model but is fully determined by 
Alice's and Bob's input bits and Bob's output. In this 
sense it resembles Bell's parameter [5], which also involves 
only random variables and can be used to test different 
physical theories. 

For a system composed of parts A, B, C, prepared in a 
state allowed by the theory, we need to assign symmetric 
and non- negative mutual informations I {A : B), etc. The 
elementary properties mentioned above are the following. 

(1) Consistency: If the subsystems A and B are both 
classical, then I{A : B) should coincide with Shannon's 
mutual information. 

(2) Data processing inequality: Acting on one of the parts 
locally by any state transformation allowed in the theory 
cannot increase the mutual information. I.e., if B — >■ _B' 
is a permissible map between systems, then I{A : B) > 
I {A : B'). This says that any local manipulation of data 
can only decay information. 

(3) Chain rule: There exists a conditional mutual infor- 
mation I {A : B\C) such that the following identity is 
satisfied for all states and triples of parts: I{A : B,C) =^ 
I{A : C) + I{A : B\C). Note that this implies an identity 
between ordinary mutual informations: 

I{A : B, C)-I{A ; C) = I{A : B\C) = I{A, C : B)-I{B : 

Information Causality holds both in classical and quan- 
tum physics; we may focus on the latter because the for- 
mer is a special case of it. This is because one can de- 
fine quantum mutual information in formal extension of 
Shannon's quantity, using von Neumann entropy [37], and 
all three of the above properties are fulfilled [25], Details 
can be found in the Supplementary Information, but in 
a nutshell one argues as follows: 

To show (a), denote by B Bob's quantum system hold- 
ing the shared quantum state pab, Alice's data a — 
(ao, . . . ,aN~i), and the m-bit message x; our objective is 
to prove I(a : x, B) < m. First, the chain rule for mutual 
information yields I{d : x,B) — I{a : B) + I{a : x\B). 
Second, /(a : B) = Q because without the message Alice's 
data and Bob's quantum state are independent (express- 
ing the no-signaling condition) . Third, we use chain rule 
again to express the conditional mutual information as 
I{a : x\B) = I{x : a, B) - I{x : B) < I{x : a,B). Fi- 
nally, the latter can be upper bounded by I{x : x) < m, 
invoking data processing. Similarly, (b) is obtained by 
repeated application of the chain rule, data processing 




FIG. 2: van Dam's protocol[17] (see also Wolf and 
Wullschleger[25j). This is the simplest case in which In- 
formation Causality can be violated. Alice receives two bits 
(ao,ai) and is allowed to send only one bit to Bob. A con- 
venient way of thinking about no-signaling resources is to 
consider paired black boxes shared between Alice and Bob 
(NS-boxes). The correlations between inputs a, b — 0,1 and 
outputs j4, B = 0, 1 of the boxes are described by probabilities 
P(A (B B = ab\a,b). The no-signaling is satisfied due to uni- 
formly random local outputs. With suitable NS-boxes Alice 
and Bob violate Information Causality. She uses a = oo ® ai 
as an input to the shared NS-box and obtains the outcome A, 
which is used to compute her message bit x = aoOA for Bob. 
Bob, on his side, inputs fo = if he wants to learn ao, and b = 1 
if he wants to learn ai ; he gets the outcome B. Upon receiving 
X from Alice, Bob computes his guess /3 = x(BB = ao(BA(BB. 
The probability that Bob correctly gives the value of the 
bit ao is Pi = ^ [P{A (B B = 0\0,0) + P{A (B B ^ 0\1,0)], 
and the analogous probability for the bit ai reads Pu = 
I [P(yl® B = 0|0, 1) +P(AeB = 111, 1)], which follow by 
inspection of the different cases. 

inequality and non-negativity of mutual information (see 
the Supplementary Information for details). 

In order to study how other no-signaling theories can 
violate Information Causality, we focus on the necessary 
condition ([3]) . First consider the simplest example of two- 
bit input of Alice, (ao, ai); it is described in Figure 2. The 
probability that Bob correctly gives the value of the bit 
flo is 

Pi = i [P{A ®B^ 0|0, 0)+P{A®B^ 0|1, 0)] , (4) 

and the analogous probability for the bit ai reads 

Pii = ^ [P{A e B = 0|0, 1)+P{A®B^ 1|1, 1)] , (5) 

where the symbol ® denotes summation modulo 2. 

One can recognize that these probabilities are in- 
timately linked with the Clauser-Horne-Shimony-Holt 
parameter[21] S, which can be used to quantify the 
strength of correlations. Indeed, 

1 1 

5 = ^^P(A®B==a6|a,5) ^2(Pi + Pii). (6) 

a=0 6=0 

The classical correlations are bounded by 5 < = 3 
(the equivalent form of Bell inequality [51 [21]). Quantum 
correlations exceed this limit up to S < Sq = 2 + \/2 (the 
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so-called Tsirelson bound [SD])- The maximal algebraic 
value of Sns = 4 is reached by the Popescu-Rohrlich 
(PR) boxjS], which is an extremal no-signaling resource. 
PR-boxes maximally violate Information Causality be- 
cause they predict Pi = Pn = 1, i.e. / = 2 for m = 1, so 
here occurs an extreme violation of Information Causal- 
ity. Bob can learn perfectly either bit. 1 = 2 measures 
the sum-total of the information accessible to Bob. How- 
ever, he cannot learn both Alice's bits - the latter would 
imply signaling. 

The protocol works just as well for any Boolean func- 
tion of the inputs, /(a, b). It is sufficient that Alice inserts 
to her PR-box the sum of /(o, 0) © /(a, 1). If Informa- 
tion Causality is maximally violated. Bob can learn the 
value of /(a, b) for any one of his inputs, irrespectively 
of Alice's input data. Even more surprisingly, this is so 
also if he does not know the function to be computed. 

We shall now demonstrate that Information Causal- 
ity is violated as soon as the quantum Tsirelson limit 
for the CHSH inequality is exceeded. This result of ours 
can be also seen as an information-theoretic proof of the 
Tsirelson bound, independent of the formalism of Hilbert 
spaces, relying instead only on the existence of a consis- 
tent information calculus for certain correlations. 

First we note that, using a suitable local randomization 
procedure that does not change the value of the param- 
eter S, any NS-box can be brought to a simple form[7]: 
the local outcomes are uniformly random and the corre- 
lations are given by 

P{A®B = ab\a,b) = ^{1 + E), (7) 

with < P < 1. The case E = 1 corresponds to the 
PR-box; E = describes uncorrelated random bits. The 
classical bound S < Sc is violated as soon a.s E > ^ ; the 
Tsirelson bound of quantum physics becomes E < Eq = 
attained by performing suitable measurements on 
the singlet state of two two-level systems [21 [30]. 

The bound that Information Causality imposes on cor- 
relations can be identified using a pyramid of NS-boxes 
and nesting the simple protocol described above (see Fig- 
ure 3). Now Alice receives = 2" bits and the proba- 
bility that Bob guesses uk correctly is given by 



(8) 



Inserting this expression into ([ij, one finds that the In- 
formation Causality condition / < 1 is violated as soon 
as 2£'^ > 1 and n large enough, i.e. E > Eq. Since all 
NS-boxes can be brought to the form ([t]) without chang- 
ing the value of S, we conclude indeed that every NS-box 
with stronger than quantum correlations violates the In- 
formation Causality condition. In Supplementary Infor- 
mation the more general result is proved, that for any 
i (£'j2 + E^{) > E'^ where Ej = 2Pj - 1 - see eqs. Q and 
^5]) - Information Causality is violated, and conversely if 



(a) a,, ¥3, ^^.ta^ 

1 f 


(b) 
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I J, 

f f 

b, B, 


k = 


A B, 



FIG. 3: Information Causality identifies the strongest 
quantum correlations. The possible no-signaling correla- 
tions satisfying Information Causality can be precisely iden- 
tified using the depicted scheme. Alice receives A = 2" 
input bits and correspondingly Bob receives n input bits 
bn which describe the index of the bit he is interested in, 
b = X]fc=o bk'2''. She is allowed to send a single bit, m = 1. 
In the case of n = 2, to encode information about her data, 
Alice uses a pyramid of NS-boxes as shown in the panel (a). 
Note that Fig. 2 shows how Bob can correctly guess the first 
or second bit of Alice using a single pair of the boxes (the case 
of n = 1). If Alice has more bits, then they recursively use 
this protocol in the following way. E. g., for four input bits 
of Alice, two pairs of NS-boxes on the level fc = allow Bob 
to make the guess of a value of any one of Alice's bits as soon 
as he knows either ao ® Al or 02 ® Ar, where Al (Ar) is the 
output of her left (right) box on the level k = 0, which are 
the one-bit messages of the protocol in Fig. 2. These can be 
encoded using the third box, on the level fc = 1, by inserting 
their sum to the Alice's box and sending x = ao (B Al ffi A 
to Bob {A is the output of her box on the level fc = 1) . De- 
pending on the bit he is interested in, he now reads a suitable 
message using the box on the level k = 1 and uses one of the 
boxes on the level fc = 0. An example of situation in which 
Bob aims at the value of 02 or as is depicted in the panel (b). 
Bob's final answer is a; ® Bo ® Bi, where Bt is the output of 
his box on the fcth level. Generally, Alice and Bob use a pyra- 
mid of A — 1 pairs of boxes placed on n levels. Looking at the 
binary decomposition of b Bob aims (n — r) times at the left 
bit and r times at the right, where r = foo -I- •■. + &n-i. His final 
guess is the sum of /3 = a:: ® _Bo ffi •.■ ® Bn-i- Therefore, Bob's 
final guess is correct whenever he has made an even number 
of errors in the intermediate steps. This leads to Eq. ([8| for 
the probability of his correct final guess (see Supplementary 
Information for the details of this calculation). 



it is fulfilled, that there exists a quantum correlation with 
these probabilities. 

In conclusion, we have identified the principle of In- 
formation Causality, which precisely distinguishes physi- 
cally realized correlations from nonphysical ones (in the 
sense that quantum mechanics cannot reach them) . It is 
phrased in operational terms and in a theory-independent 
way and therefore we suggest it is at the same founda- 
tional level as the no-signaling condition itself, of which 
it is a generalization. 

The new principle is respected by all correlations 
accessible with quantum physics while it excludes all 
no-signaling correlations, which violate the quantum 
Tsirelson bound. Among the correlations that do not 
violate that bound it is not known whether Information 
Causality singles out exactly those allowed by quantum 
physics. If it does, the new principle would acquire even 
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stronger status. 
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SUPPLEMENTARY INFORMATION 

THE GENERIC NATURE OF THE CONSIDERED 

TASK 

Assume that Alice has a data set, a CD or whatever, 
which can be encoded into a A^-bit string, a. Bob may 
wish to have an access to that set of data. Of course, 
without any communication he has no access at all. How- 
ever, if they share randomness, or a source of random- 
ness, and a protocol, Alice can, by transferring m bits, 
allow him an access to a specific m-bit subsequence of 
her data. Thus N — m bits are still inaccessible to Bob. 
Transfer of m-bits reduces the number of inaccessible bits 
from N to N — m, or more if the protocol is not optimal. 
We have an accessibility gain of up to m bits. PR-boxes 
clearly violate this limit. A transfer of m-bits, due to 
no-signaling still allows to access m bit sequences only, 
however the full data set is open for such a readout. The 
number of inaccessible bits is reduced to 0. Accessibility 
gain is N. 

In the most elementary case of just one bit transfer, the 
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Information Causality allows one in an optimal protocol 
to decode the value of just one specific bit. For PR-boxes 
one bit transfer opens access to any bit. That is, all bits 
are readable, with no-signaling constraining the actual 
readout to just one of them. 

Further, note that, any Boolean function of Alice's and 
Bob's data can be put in the following way 



Iterating these steps N — 1 times to the rightmost infor- 
mation gives 



f{d,b)^J2^^'>'f'a{b'), 



where fg{b') is again Boolean. For example, the original 
data string is a simple function of this kind Ag{b) — 
aft. Therefore any function fg{b) is just a preparation 
of a new string of data, in form of a Boolean function 
of the old string. It has exactly the same length. Thus 
our problem contains within itself a completely general 
problem of obtaining the value of any f{d, h). That is, it 
is a generic problem for dichotomic functions. 



I{a ■.x,B)>Y^ I{aK ■■ X, B). 



(10) 



K=0 



Finally, we observe that Bob's guess bit /3 is obtained at 
the end from 6, x and B. Hence, the data processing 
inequality puts a limit of I{aK ■ f3\b = K) < I{aK '■ x, B) 
on Bob's accessible information. Putting this together 



with eq. (10) yields the result. 



INFORMATION CALCULUS AND 
INFORMATION CAUSALITY 

Here we prove bounds for the mutual information 
I {a : x,B), where a is a string of Alice's bits, x is her clas- 
sical message and B denotes Bob's part of the pre-shared 
no-signalling resource, assuming the three abstract prop- 
erties for the mutual information, as follows. 

(1) Consistency: If the subsystems A and B are both 
classical, then I{A : B) should coincide with Shannon's 
mutual information. 

(2) Data processing inequality: Acting on one of the parts 
locally by any state transformation allowed in the theory 
cannot increase the mutual information. I.e., if _B — > _B' 
is a permissible map between systems, then I{A : B) > 
I {A : B'). This says that any local manipulation of data 
can only decay information. 

(3) Chain rule: There exists a conditional mutual infor- 
mation I {A : B\C) such that the following identity is 
satisfied for all states and triples of parts: I{A : B,C) ~ 
I{A : C) -{- I{A : B\C). Note that this implies an identity 
between ordinary mutual informations: 

I{A : B, C)-I{A : C) = I{A : B\C) = I{A, C : B)-I{B : C) 

We show first /(a : x,B) > I — X]^=o -^('^k '■ P) in the 
case of independent Alice's input bits. Namely, by the 
chain rule (3), we can isolate Alice's first bit, obtaining 
/(ao, . . . , ON-i ■■ x,B) = I{ao : x,B) + /(ai, . . . , qn-i ■ 
X, -B I flQ ) . The second term on the right-hand side 
equals, using chain rule once more, I{ai, . . . , a^^i : 
X, B\ao) = /(ai, . . . , a^-i ■ x, B, oq) - /(ai, . . . , a^-i : 
ao), in which, due to the independence of Alice's inputs, 
/(oi, . . . ,aN-i ■ ao) = 0. Applying the data processing 
inequality (2) to the first term here then implies 

I{d : X, B) > /(ao : x, B) + /(ai, . . . , qn-i ■ x,B). (9) 



I{d:x,B)>J2HaK-P\b = K)=I, (11) 

which is the efficiency described in the main text. Note 
that implicitly we have made use of the consistency con- 
dition (1) here. 

Second, we prove that the same assumptions lead to 
I{d : X, B) < m, i.e. Information Causality. To do so we 
need a little preparation. Note that from consistency (1) 
and data processing (2), we inherit automatically an im- 
portant property of Shannon mutual information, namely 
the fact that I{A : S) = if two systems A and B are 
independent, i.e. the state is a (tensor) product of states 
on A and on B, respectively. To prove this, observe that 
the state can thus be prepared by allowed local oper- 
ations starting from two classical independent systems. 
These have zero mutual information by consistency (1), 
so I {A : -B) < by data processing (2). On the other 
hand, mutual information must be non-negative, hence 
I{A ■.B) = 0. 

Now, 

I{d ■.x,B)= I{d : B) + I{d : x\B) 
= I{d : x\B) 
= I{x:d, B) - I{x: B) 
< I{x:d, B), 

where we have invoked the chain rule (3), the indepen- 
dence of a from B (which owes itself to the no-signaling 
condition), chain rule once more and non- negativity of 
mutual information. 

We are finished now once we argue that I{x : d,B) < 
I(x : x), because the latter is a quantity only involving 
classical objects, so it can be evaluated as the Shannon 
entropy of x by the consistency requirement (1), and the 
entropy is upper bounded by m. But this inequality fol- 
lows once more from data processing (2), because the 
joint state of x, a and B is given by some distribution 
on a?, and a joint state for a and B for each value x can 
take. In other words, there is a state preparation for each 
value of x, hence there must exist the corresponding state 
transformation x — ^ a, i? in the theory. 
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SIMPLIFIED LOWER BOUND ON 7 

The conditional mutual informations can be simplified 
using the probability of Bob's correct guess of de- 
noted by Pk, i.e. the probability that ® (i = 0, 
given b ~ K. Since Alice's inputs are uniformly random, 
the binary entropy h{aK) — 1, we have /(a^ : — 
K) ^ I - H{aK\fi,b = K). Note that, H{aK\l3,b = 
K) = H{aK © h = K) because knowing (3 leaves the 
same uncertainty about and ® /3 (this can also 
be proved using the chain rule for conditional entropy). 
Omitting the conditioning on fi can only increase the en- 
tropy, H{aK®fi\fi,b^ K) < H{aK®fi\b = K) = h[PK). 
Therefore, 



N-1 



(12) 



K=0 



as stated in the main text. This inequality can also be 
seen as a special case of Fano's inequality [28]. In a more 
general case, in which Alice's inputs acquire values from 
an alphabet of d elements, Fano's inequality gives the 
bound 



N-1 



N-1 



I > iVlog2 d- J2 KPk)~ log2('^-l)- (13) 



K=0 



K=0 



Similarly, one can write a bound for any inputs of Alice. 

Since the necessary condition for Information Causal- 
ity to hold reads I < m, one finds by looking at the 



expression ( 12 ) that Information Causality limits the 



probability of Bob's correct guess, unless all information 
about Alice's bits is communicated to Bob: 



N 



J2hiPK)>N 



(14) 



K=0 



INFORMATION CAUSALITY IN CLASSICAL 
AND QUANTUM PHYSICS 

Here we show that Information Causality holds in clas- 
sical and quantum physics. All we have to do, in the light 
of our previous reasoning, is to write down expressions 
for the mutual information and conditional mutual infor- 
mation, and confirm that they satisfy properties (l)-(3). 
We focus on quantum correlations because classical cor- 
relations form a subset of quantum correlations. With 
respect to any tripartite state pabCi denote by p^, etc. 
its reduced states, and write S{p) — — Trplogp for the 
von Neumann entropy. Then let 

I{A : B) - S{pa) + S{pb) - S{pab), 
I{A : B\C) = S{pAc) + S{pBc) - S{pabc) - S{pc). 

Both expressions are manifestly invariant under 
swapping A and B, and non- negative by strong 



subadditivitv|28) . Clearly, consistency (1) holds, as 
classical correlations are embedded as matrices diagonal 
in some fixed local bases and then von Neumann entropy 
reduces to Shannon entropy. Also the chain rule (3) is an 
easily verified identity. The data processing inequality 
(2) is equivalent once more to the data processing 
inequality [21]. 

To verify the steps in our abstract derivation of Infor- 
mation Causality in the quantum case, denote the initial 
state shared between Alice and Bob by pab- Including 
Alice's data as orthogonal states of a reference system R, 
the situation before the communication can be described 
by the state 



1 

2^ 



ae{0,l}" 



\d){d\j^ (g) PAB- 



(15) 



For each value of a Alice has to perform local operations 
to obtain the message x she wants to send to Bob. What- 
ever her algorithm to do so, it can be condensed into a 
quantum measurement (POVM) (Afl°'')jg{g ^jm, and so 
the joint state of Alice's data, the message (represented 
by orthogonal states of a "message" system X) and Bob's 
system is given by 



1 

2^ 



ae{04}" 



a a 



TTA(pABiM^f^ ®t). (16) 



TSIRELSON BOUND FROM INFORMATION 
CAUSALITY 

We present a proof that Information Causality is vi- 
olated by all stronger than quantum correlations. Our 
protocol of the main proof uses NS-boxes, which pro- 
duce uniformly random local outcomes and correlations 
described by the probabilities P{A (B B — ab\a, b), where 
a, 6 = 0, 1 are the inputs to the boxes and A, B — 0, 1 
are the outputs for Alice and Bob respectively. It will 
be sufficient to consider the situation where Alice com- 
municates only one bit to Bob, i.e. m = 1. The number 
of Alice's input bits is chosen as A^ = 2", where n is an 
integer parameterizing the task. Correspondingly, Bob 
receives n input bits to encode the index 6 as a binary 
string (6o, h,..., 6„_i), i.e. b = Y^kZo ■ 

We generalize the procedure given in the main text 
to N bits, recursively, using the insight of [17 that 
any function, which can be written as a Boolean for- 
mula with ANDs, XORs and NOTs, can be computed 
in a distributed manner using the same number of PR- 
boxes [3] as ANDs and one bit of communication. The 
function we are considering is /„(«, 6) = at, with a — 
(ao, fli, . . . ajv-i). In the simplest case, n = 1, the func- 
tion of the task reads 



/i((ao,ai),6o) = ao ® ^o(ao © ai). 



(17) 
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It involves a single AND. Alice inputs ao©ai into the PR- 
box, Bob 60; with her output A Alice forms the message 
X = Go (B A, so that Bob can obtain x ® B = at. 

Moving to n > 1, write a = a' a" with two bit-strings a' 
and a" of length N/2 = 2"~^ each. Then it is a straight- 
forward exercise to verify that 

b) = fn-i{a\ h') e 6„_i [/„-i(a', h') ® /„_i(a", h')] , 

(18) 

where h' is the string of n — 1 Bob's bits (60, ...,5„_2)- 
Thus, if could be written using N/2 — 1 ANDs, this 
formula expresses /„ using iV — 1 AND operations. For 
instance, 

/2((ao, 01,02,03), (5o,6i)) = ao ® 6o(ao © oi) © 

®6i [ao ® 6o(ao © ai) ® 02 ® 60(02 ® 03)] . (19) 

To convert this to a distributed protocol, Alice and Bob 
use three PR-boxes. To the first one Alice inputs ao®ai, 
to the second one 02 ® 03, and to the third one og ® 
® a2 ® ^2, where Ax and A2 are her outputs from 
the first and second box, respectively. She transmits x — 
ao ® ffi A3, where A3 is her output from the third 
box. Depending on his inputs. Bob will use two different 
boxes to decode ah- He inserts the bit 61 distinguishing 
groups (ao,ai) and (02,03) to the third box, obtaining 
output B3. Due to the correlations of the boxes, the sum 
cc ffi ^3 gives the value of ao ® Ai or 02 ffi A2, depending 
on 61 = or &i = 1. These are exactly the messages he 
would obtain from Alice in the scenario with the single 
pair of boxes, and the protocol is now reduced to the 
previous one. For example, if fei = 1 he will input &o to 
the second PR-box, giving him an output -82, so that he 
can form x ® -83 ffi J52 = 02 ffi A2 ffi i?2 , which is either 02 
or 03. Note that the other PR-box is ignored by Bob, or 
he may as well input 60 to it, too - it is not important 
because he doesn't need its output. 

In the general case, Alice and Bob share — 1 = 2" — 
1 PR-boxes, and for every set of inputs Bob uses n of 
them. The protocol can be explained recursively, based 
on eq. (18): Alice and Bob use the protocol for n — 1 
Bob's bits on the input pairs (o', b') and (a", &'), involving 
N /2 — 1 PR-boxes in each one, resulting in two single- 
bit messages x' and x" that Alice would send to Bob 
if their objective were to compute fn-i- Instead, she 
inputs x' ffi x" into the last PR-box, while Bob inputs 
6„_i; they obtain an output bits A and i?, respectively, 
so when Alice finally sends the message x = x' (B A, this 
allows Bob to obtain x' ffi 6„_i(a;' ffi x"), which is either 
x' or x" depending on = or 6„_i = 1. Then, the 
protocol for n — 1 bits tells him the outputs of which PR- 
boxes he should use in order to arrive at ab- Since in the 
protocol for n — 1 Bob's bits he only needs the outputs of 
n — 1 boxes, he reads n outputs for n bits; likewise, Alice 
uses 2(iV/2 - 1) + 1 = - 1 PR-boxes in total. 

Now, we simply substitute PR-boxes with NS-boxes, 
having their probabilities of guessing first and second bit 



of Alice's input sum given by Pi and Pu, respectively. By 
looking at his input bits. Bob finds that the final guess 
of the value of Ob involves aiming (n — k) times at the left 
bit and k times at the right, where k = + ■ ■ ■ + is 
the number of Is in the binary decomposition of b. Since 
Bob's answer is computed as the sum of the message x 
and suitable outputs of n boxes, whenever even number 
of boxes produce "wrong" outputs, i.e. such that A ffi 
B ab, Bob still arrives at the correct final answer. 
Therefore, Bob's guess is correct whenever he has made 
an even number of errors in the intermediate steps. 
Let us denote by 



'9cvcn(^) 



LIJ 

E 



2j 



1 



[1 + (2P-1)'=] 
(20) 



the probability to make an even number of errors when 
using k pairs of boxes, each producing a correct value 
with probability P. Similarly, the probability to make 
an odd number of errors reads 



E 



k 

2j + l 



2j + l 



P^-^^-\\-P) 



= -[l-{2P-lf]. 



(21) 



With this notation, the probability that Bob's final guess 
of the value of aK is correct is given by 



1 ^^-fe - 



[1 + E'^-'^E^x] 



(22) 



with Ej = 2Pj - 1. 



We are ready to compute the Information Causality 
quantity (1) of the main text: 



/ = 



> 



N 

El 

n 

E 

fc=0 

1 



1 - KPk)] 



1 - h 



-^n—k-^k 



21n2 ^ 

A;=0 

21!^ 



{Efr-'{E^d 



2 \k 



(23) 



where we have used 1 



M^)>2fe- Therefore, if 



El + El 



> 1, 



(24) 



there exist n such that / > 1. 

It is also possible, and not difficult, to show that when- 
ever El and En do not violate Information Causality then 
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there exists a quantum protocol that gives such correla- 
tions. 

For the isotropic correlations (6) of the main text, Ei — 



Eu = E, hence eq. ^ becomes Pk = ^[1 + and 
eq. (24) becomes 2E^> 1 as stated in the main text. 



